Model Selection

Vietnamese speech recognition

# Vietnamese speech recognition

Whisper Small Vi

An automatic speech recognition model fine-tuned on Vietnamese speech data based on openai/whisper-small, improving Vietnamese transcription accuracy and robustness

Speech Recognition

Transformers Other

Whisper Base Vi

A speech recognition model fine-tuned on 100 hours of Vietnamese speech data based on openai/whisper-base model, improving Vietnamese transcription accuracy

Speech Recognition

Transformers Other

Chunkformer Large Vie

A large-scale Vietnamese automatic speech recognition model based on the ChunkFormer architecture, fine-tuned on approximately 3000 hours of publicly available Vietnamese speech data, with excellent performance.

Speech Recognition

Vi Whisper Large V3 Turbo V1

Whisper-V3-Turbo model optimized for Vietnamese automatic speech recognition (ASR) tasks, fine-tuned using multiple Vietnamese datasets

Speech Recognition

Transformers Other

Viwhisper Medium

Whisper-medium model optimized for Vietnamese speech recognition tasks, fine-tuned on 1308 hours of Vietnamese data

Speech Recognition

Transformers Other

Whisper Tiny Vi

Vietnamese automatic speech recognition (ASR) model fine-tuned based on OpenAI Whisper-tiny architecture, demonstrating excellent performance on multiple Vietnamese datasets

Speech Recognition

Transformers Other

Phowhisper Medium

PhoWhisper is a series of models designed specifically for Vietnamese automatic speech recognition (ASR). It achieves high robustness by fine-tuning the Whisper model on an 844-hour Vietnamese accent dataset.

Speech Recognition

Transformers Other

Phowhisper Small

PhoWhisper is a system specifically designed for Vietnamese automatic speech recognition, fine-tuned based on the Whisper model, supporting various Vietnamese accents.

Speech Recognition

Transformers Other

Wav2vec2 Bartpho

This is an automatic speech recognition model supporting Vietnamese, capable of outputting normalized text, timestamp labeling, and multi-speaker segmentation.

Speech Recognition

Transformers Other

Whisper Large V2 Vietnamese

This model is an automatic speech recognition (ASR) model based on OpenAI's Whisper Small architecture, fine-tuned on the Common Voice 11.0 Vietnamese dataset

Speech Recognition

Transformers Other

Wav2vec2 Large Vi Vlsp2020

Vietnamese automatic speech recognition model based on wav2vec2 architecture, pre-trained with 13,000 hours of unlabeled YouTube audio and fine-tuned on 250 hours of labeled data

Speech Recognition

Transformers Other

Wav2vec2 Base Vietnamese 160h

Vietnamese speech recognition model based on Wav2vec2, fine-tuned on 160 hours of Vietnamese speech data

Speech Recognition

Transformers Other

Viwav2vec2 Base 3k

This model is a Wav2Vec2 base model pre-trained on 3,000 hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, and requires fine-tuning on downstream tasks for use.

Speech Recognition

Transformers Other

Viwav2vec2 Base 1.5k

This model is pretrained on 1.5k hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, requires fine-tuning before use.

Speech Recognition

Transformers Other

Wav2vec NCKH 2022

Vietnamese automatic speech recognition model based on Wav2vec2 architecture, supporting audio-to-text conversion

Speech Recognition

Transformers Other

Wav2vec2 Large Xls R 300m Vietnamese Colab

This model is a Vietnamese speech recognition model fine-tuned on the Common Voice dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Wav2vec2 Base Vietnamese

Vietnamese speech recognition model based on Wav2Vec2 architecture, fine-tuned on VSLP dataset, supports 16kHz sampled speech input

Speech Recognition

Transformers Other

Fb Youtube Vi Large

This model is an automatic speech recognition model fine-tuned on Vietnamese YouTube informal audio datasets, based on facebook/wav2vec2-large-xlsr-53.

Speech Recognition

A Vietnamese automatic speech recognition model fine-tuned on the PHONGDTD/VINDATAVLSP - NA dataset based on microsoft/wavlm-base-plus

Speech Recognition

Wavlm Vindata Demo Dist

An automatic speech recognition model fine-tuned on Vietnamese datasets based on microsoft/wavlm-base

Speech Recognition

Wav2vec2 Base Vn 270h

A speech recognition model fine-tuned with approximately 270 hours of Vietnamese annotated data, supporting Vietnamese automatic speech recognition tasks

Speech Recognition Other

This model is a speech recognition model fine-tuned on the Common Voice 7.0 Vietnamese dataset and private datasets based on facebook/wav2vec2-xls-r-300m.

Speech Recognition

Transformers Other

Fb Vindata Vi Large

This model is a Vietnamese automatic speech recognition model fine-tuned on the PHONGDTD/VINDATAVLSP - NA dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Xls Asr Vi 40h 1B

Vietnamese automatic speech recognition model fine-tuned on 40 hours of FPT Open Speech Dataset (FOSD) and Common Voice 7.0 dataset based on facebook/wav2vec2-xls-r-1b

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Vietnamese

A Vietnamese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained using the Common Voice dataset.

Speech Recognition Other

Wav2vec2 Base Vietnamese 250h

Vietnamese automatic speech recognition model based on wav2vec 2.0 architecture, trained on 13,000 hours of unlabeled audio and 250 hours of labeled data

Speech Recognition

Transformers Other

Fine Tune XLSR Wav2Vec2 Speech2Text Vietnamese

This is a Vietnamese automatic speech recognition (ASR) repair model based on the MT5 architecture, fine-tuned for Vietnamese speech recognition tasks.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Vietnamese

A Vietnamese automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Vietnamese

This is a Vietnamese fine-tuned speech recognition model based on facebook/wav2vec2-large-xlsr-53, trained using the Common Voice and Infore_25h datasets.

Speech Recognition Other

Wav2vec2 Large Xlsr Vietnamese

Vietnamese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53

Speech Recognition Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase